Regressão Linear Simples com Julia

Fonte: Wikipedia - Regressão Linear Simples

Definição:

Em estatística , regressão linear simples é o quadrado mínimo estimador de um modelo de regressão linear com uma única variável explicativa. Em outras palavras, regressão linear simples se encaixa uma linha reta através do conjunto de pontos n de tal forma que faça a soma dos quadrados residuais do modelo ( isto é, as distâncias entre os pontos verticais do conjunto de dados e a linha reta) tão pequena quanto possível.

Para se estimar o valor esperado, usa-se de uma equação, que determina a relação entre ambas as variáveis:

$${\Large Y_i = \alpha + \beta X_i}$$

In [3]:
using Gadfly
using CurveFit
set_default_plot_size(23cm,10cm)

In [4]:
x=rand(100)
y=rand(100) .+ 1.6*(x .+ 20.0);

In [5]:
p=plot(x=x, y=y)


Out[5]:
x -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 -1.00 -0.95 -0.90 -0.85 -0.80 -0.75 -0.70 -0.65 -0.60 -0.55 -0.50 -0.45 -0.40 -0.35 -0.30 -0.25 -0.20 -0.15 -0.10 -0.05 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00 1.05 1.10 1.15 1.20 1.25 1.30 1.35 1.40 1.45 1.50 1.55 1.60 1.65 1.70 1.75 1.80 1.85 1.90 1.95 2.00 -1 0 1 2 -1.0 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 29.0 29.5 30.0 30.5 31.0 31.5 32.0 32.5 33.0 33.5 34.0 34.5 35.0 35.5 36.0 36.5 37.0 37.5 29.5 29.6 29.7 29.8 29.9 30.0 30.1 30.2 30.3 30.4 30.5 30.6 30.7 30.8 30.9 31.0 31.1 31.2 31.3 31.4 31.5 31.6 31.7 31.8 31.9 32.0 32.1 32.2 32.3 32.4 32.5 32.6 32.7 32.8 32.9 33.0 33.1 33.2 33.3 33.4 33.5 33.6 33.7 33.8 33.9 34.0 34.1 34.2 34.3 34.4 34.5 34.6 34.7 34.8 34.9 35.0 35.1 35.2 35.3 35.4 35.5 35.6 35.7 35.8 35.9 36.0 36.1 36.2 36.3 36.4 36.5 36.6 36.7 36.8 36.9 37.0 27.5 30.0 32.5 35.0 37.5 29.4 29.6 29.8 30.0 30.2 30.4 30.6 30.8 31.0 31.2 31.4 31.6 31.8 32.0 32.2 32.4 32.6 32.8 33.0 33.2 33.4 33.6 33.8 34.0 34.2 34.4 34.6 34.8 35.0 35.2 35.4 35.6 35.8 36.0 36.2 36.4 36.6 36.8 37.0 y

Cálculo do fator α:


$${\large \hat{\alpha} = \frac{\sum{X^2}\sum{Y}-\sum{(XY)}\sum(X)}{n\sum{X^2}-(\sum{X})^2}}$$



In [6]:
α =  (sum(x .^ 2.0) * sum(y) - sum(x .* y) * sum(x)) / (length(x) * sum(x .^ 2.0) - sum(x)^2)


Out[6]:
32.52668663248782

Cálculo do fator β:


$${\large \hat{\beta} = \frac{n\sum{(XY)}-\sum{X}\sum{Y}}{n\sum{(X^2)}-(\sum{X})^2}}$$



In [7]:
β = (length(x)*sum(x .* y) - sum(x) * sum(y)) / (length(x)*sum(x .^ 2.0) - sum(x)^2)


Out[7]:
1.5819239757522379

Definição da função da reta:


In [8]:
yy(xx) = α + β*xx


Out[8]:
yy (generic function with 1 method)

Gráfico dos pontos com a reta encontrada


In [9]:
plot(layer(x=x, y=y, Geom.point), layer(yy, minimum(x), maximum(x), Geom.line, Theme(default_color=colorant"red")))


Out[9]:
x -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 -1.00 -0.95 -0.90 -0.85 -0.80 -0.75 -0.70 -0.65 -0.60 -0.55 -0.50 -0.45 -0.40 -0.35 -0.30 -0.25 -0.20 -0.15 -0.10 -0.05 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00 1.05 1.10 1.15 1.20 1.25 1.30 1.35 1.40 1.45 1.50 1.55 1.60 1.65 1.70 1.75 1.80 1.85 1.90 1.95 2.00 -1 0 1 2 -1.0 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 29.0 29.5 30.0 30.5 31.0 31.5 32.0 32.5 33.0 33.5 34.0 34.5 35.0 35.5 36.0 36.5 37.0 37.5 29.5 29.6 29.7 29.8 29.9 30.0 30.1 30.2 30.3 30.4 30.5 30.6 30.7 30.8 30.9 31.0 31.1 31.2 31.3 31.4 31.5 31.6 31.7 31.8 31.9 32.0 32.1 32.2 32.3 32.4 32.5 32.6 32.7 32.8 32.9 33.0 33.1 33.2 33.3 33.4 33.5 33.6 33.7 33.8 33.9 34.0 34.1 34.2 34.3 34.4 34.5 34.6 34.7 34.8 34.9 35.0 35.1 35.2 35.3 35.4 35.5 35.6 35.7 35.8 35.9 36.0 36.1 36.2 36.3 36.4 36.5 36.6 36.7 36.8 36.9 37.0 27.5 30.0 32.5 35.0 37.5 29.4 29.6 29.8 30.0 30.2 30.4 30.6 30.8 31.0 31.2 31.4 31.6 31.8 32.0 32.2 32.4 32.6 32.8 33.0 33.2 33.4 33.6 33.8 34.0 34.2 34.4 34.6 34.8 35.0 35.2 35.4 35.6 35.8 36.0 36.2 36.4 36.6 36.8 37.0 y

Uso da biblioteca


In [10]:
α, β = CurveFit.linear_fit(x, y)


Out[10]:
2-element Array{Float64,1}:
 32.5267 
  1.58192

In [11]:
yyy(xx) = α + β*xx


Out[11]:
yyy (generic function with 1 method)

In [12]:
plot(layer(x=x, y=y, Geom.point), layer(yyy, minimum(x), maximum(x), Geom.line, Theme(default_color=colorant"red")))


Out[12]:
x -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 -1.00 -0.95 -0.90 -0.85 -0.80 -0.75 -0.70 -0.65 -0.60 -0.55 -0.50 -0.45 -0.40 -0.35 -0.30 -0.25 -0.20 -0.15 -0.10 -0.05 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00 1.05 1.10 1.15 1.20 1.25 1.30 1.35 1.40 1.45 1.50 1.55 1.60 1.65 1.70 1.75 1.80 1.85 1.90 1.95 2.00 -1 0 1 2 -1.0 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 29.0 29.5 30.0 30.5 31.0 31.5 32.0 32.5 33.0 33.5 34.0 34.5 35.0 35.5 36.0 36.5 37.0 37.5 29.5 29.6 29.7 29.8 29.9 30.0 30.1 30.2 30.3 30.4 30.5 30.6 30.7 30.8 30.9 31.0 31.1 31.2 31.3 31.4 31.5 31.6 31.7 31.8 31.9 32.0 32.1 32.2 32.3 32.4 32.5 32.6 32.7 32.8 32.9 33.0 33.1 33.2 33.3 33.4 33.5 33.6 33.7 33.8 33.9 34.0 34.1 34.2 34.3 34.4 34.5 34.6 34.7 34.8 34.9 35.0 35.1 35.2 35.3 35.4 35.5 35.6 35.7 35.8 35.9 36.0 36.1 36.2 36.3 36.4 36.5 36.6 36.7 36.8 36.9 37.0 27.5 30.0 32.5 35.0 37.5 29.4 29.6 29.8 30.0 30.2 30.4 30.6 30.8 31.0 31.2 31.4 31.6 31.8 32.0 32.2 32.4 32.6 32.8 33.0 33.2 33.4 33.6 33.8 34.0 34.2 34.4 34.6 34.8 35.0 35.2 35.4 35.6 35.8 36.0 36.2 36.4 36.6 36.8 37.0 y